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ORIGINAL ARTICLE 

Targeted cancer exome sequencing reveals recurrent mutations 
in myeloproliferative neoplasms 

E Tenedini^'^'^ I Bernardis^'^ V Artusi^'^ L Artuso^'^ E Roncaglia^'^ P Guglielmelli^ L Pieri^ C Bogani^ F Biamonte^ G Rotunno^ 
C Mannareiii^ E Bianchi^'^ A Pancrazzi^ T Fanelli^ G Malagoli Tagliazucchi^'^ S Ferrari^'^ R Manfred ini^'^ AM Vannucchi"^ and 
E Tagliafico^'^ on behalf of AGIMM investigators 

With the intent of dissecting the molecular complexity of Philadelphia-negative myeloproliferative neoplasms (MPN), we designed a 
target enrichment panel to explore, using next-generation sequencing (NGS), the mutational status of an extensive list of 2000 cancer- 
associated genes and microRNAs. The genomic DNA of granulocytes and in wfro-expanded CD3 + T-lymphocytes, as a germline control, 
was target-enriched and sequenced in a learning cohort of 20 MPN patients using Roche 454 technology. We identified 141 genuine 
somatic mutations, most of which were not previously described. To test the frequency of the identified variants, a larger validation 
cohort of 189 MPN patients was additionally screened for these mutations using Ion Torrent AmpliSeq NGS. Excluding the genes 
already described in MPN, for 8 genes (SCRIB, MIR662, BARDl, TCFU, FAT4, DAPS, POLG and NRA5), we demonstrated a mutation 
frequency between 3 and 8%. We also found that mutations at codon 12 of NRAS (NRASGMV and NRASGMD) were significantly 
associated, for primary myelofibrosis (PMF), with highest dynamic international prognostic scoring system (DIPSS)-plus score categories. 
This association was then confirmed in 66 additional PMF patients composing a final dataset of 168 PMF showing a NRAS mutation 
frequency of 4.7%, which was associated with a worse outcome, as defined by the DIPSS plus score. 
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INTRODUCTION 

The discovery of the -//A/C2V617F mutation in 2005^ represented a 
major breakthrough in the understanding of the molecular 
pathogenesis of Philadelphia chromosome-negative chronic 
myeloproliferative neoplasms (MPN).^ The JAK2\/6Mf mutation 
is harbored by nearly all polycythemia vera (PV) patients and 50- 
60% of patients with essential thrombocythemia (ET) and primary 
myelofibrosis (PMF).^ The resulting constitutively activated JAK/ 
STAT signaling is considered central to the pathogenesis'^ and 
phenotype of MPN and therefore serves as a rational drug target 
for therapy.^ Additional mutations were described at MPL codon 
10 (W515L/K/A) in 5-8% of ET and PMF patients.^"^ However, the 
significant proportion of JAK2\/6^7f- and /WP/.W515L-negative 
MPN cases required additional effort to identify novel genetic 
lesions contributing to disease pathogenesis. Recent studies based 
on single-nucleotide polymorphism (SNP) array-based karyotyping 
resulted in the detection of several copy number alterations 
such as a loss of heterozygosity and a copy neutral loss of 
heterozygosity in genomic regions containing multiple members 
of the polycomb repressive complex 2^"^^ and other genes 
previously implicated in different hematological malignancies.^^ 
Moreover, via candidate gene sequencing, several novel bona fide 
somatic mutations^"^ were detected at frequencies ranging from 
1% to 20-30% in genes frequently mutated in other myeloid 
neoplasms, as well as MDS and acute myeloid leukemia, and some 
of these mutations have been positively correlated with clinical 
outcome.^ ^"^^ Nevertheless, because a significant proportion 



of MPN cases are negative for molecular aberrations, a complete 
portrait of MPN genetic abnormalities remains to be depicted. 
The two major theoretical and technical drawbacks to the identi- 
fication of new somatic mutations are represented, respectively, 
by the huge number of genes potentially involved in MPN 
tumorigenesis and by the availability of 'pure' germline control 
DNA. Buccal swabs and saliva have generally been considered as 
readily available sources of non-hematopoietic DNA, but detection 
of the JAK2\/6Mf mutation in at least some of these samples 
suggested the presence of myeloid cell contamination. 
In addition, no evidence for germline transmission of the 
JAK2\/6Mf mutation has been elucidated until now. 

Given these previous results and with the goal of further 
exploring the molecular complexity of MPN, we investigated the 
incidence of mutations in genes already known to be implicated in 
cancer pathogenesis. Therefore, we designed a two-tiered next- 
generation sequencing (NGS) study. We first evaluated the somatic 
mutational status of a mostly inclusive list of known cancer-related 
genes in a 25 MPN sample learning set. We then tested the 
recurrence of the truly somatic variants in a broader validation set of 
189 patients via an amplicon-sequencing NGS approach. 

MATERIALS AND METHODS 

Patients and samples 

Patients were diagnosed as having PV, PMF and post PV-MF according to 
the World Health Organization (WHO)^^ and the International Working 
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Group for Myelofibrosis Research and Treatment (IWG-MRT) criteria.^° 
All subjects provided informed written consent, and the study was 
performed under the Florence Institutional Review Board's approved 
protocol. The study was conducted in accordance with the Declaration of 
Helsinki. The presence of the 7/A/C2V617F and MP/.W515L mutations 
and the mutated allele burden were determined via quantitative real- 
time PGR (QRT-PGR), as previously described.^^ 

The mutational status of ASXLI, EZH2, IDH1, as well as IDH2, SRSF2, TET2, 
DNMT3A and CBL was assessed using Sanger sequencing, as previously 
described.^^ Gytogenetic analysis was performed on Giemsa-stained slides. 
All patients were examined within 1 year of diagnosis. 

Granulocyte and CDS + T cell purification and genomic DNA 
extraction 

Granulocytes were obtained via the density gradient centrifugation of 
peripheral blood samples, and CD3+ cells were immunomagnetically 
selected (Miltenyi Biotec GmbH, Bergisch Gladbach, Germany) from 
the peripheral blood mononuclear cell fraction recovered from the 
density gradient. After sorting, CDS + cells were expanded in vitro. The 
purity of the CDS+ cells was determined using flow cytometry. The 
culture conditions as well the DNA extraction quality control procedures 
are fully described in the Materials and Methods section of the 
Supplementary Information. 

Target enrichment and 454 sequencing 

A solution-based capture custom panel was designed for target enrich- 
ment according to the NimbleGen (Roche NimbleGen, Inc., Madison, 
Wl, USA) guidelines. The final list of genes and miRNAs were obtained by 
combining the complete list of mutations present in the latest release of 
the Sanger Institute Cancer Gene Census Database (www.sanger.ac.uk/ 
genetics/CGP/Census) with the most inclusive list of DNA repair genes 
present in the OMIM Database (www.ncbi.nlm.nih.gov/omim) as well as a 
manually curated literature screening. The custom design was inserted into 
a 5-Mb SeqCap EZ Choice Library (Roche NimbleGen, Inc., Madison, Wl, 
USA) containing the exonic portions of 1400 genes and 600 miRNA coding 
sequences (Supplementary Table 1). 

A sample library preparation was performed using 500 ng of DNA from 
the granulocyte and CDS+T-cell samples. Each step of the working 
procedure to perform sequencing runs on the Roche 454 GS FLX platform 
is fully described in the Materials and Methods section of the 
Supplementary Information. 

Variant detection, filtering and classification 

The processing of the samples, evaluation of genuine somatic mutations 
and their classification are entirely described in the Materials and Methods 
section of the Supplementary Information. 

Recurring variant validation test 

Recurrence testing for genuine somatic variants was performed using Ion 
AmpliSeq technology with an Ion Torrent Personal Genome Machine 
(PGM) platform. The Ion AmpliSeq panel design, sample processing, 
barcoding and sequencing are fully described in the Materials and 
Methods section of the Supplementary Information. 

N-RAS c.S5G>A mutation analysis 

An independent set of 1S9 patients with PMF were recruited from the 
archive in Florence and used for NRAS and KRAS mutation analysis 
(see Materials and Methods section of the Supplementary Information for 
details). 

Statistical analysis 

The x^/ Fisher's exact test (2x2 table) or test for trend (larger 
contingency table) were used as appropriate to compare variables from 
different patient groups that had been categorized according to 
mutational status. The analysis of continuous variables among the groups 
was performed using the Mann-Whitney U test (two groups) or the 
Kruskal-Wallis test with the Dunn method for multiple comparisons. 
P<0.05 was considered to indicate statistical significance; all tests 
were two-tailed. Data were processed using SPSS Version 19.0 software 
(StatSoft, Tulsa, OK, USA). 
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RESULTS 

A total of 25 tumor granulocyte and paired germline samples 
comprised the learning cohort. The NGS samples had been 
collected at the time of diagnosis in 9 PV subjects and 1 1 PMF 
subjects, while the additional 5 DNA samples were obtained from 
5 of the 9 PV patients at the time that they evolved to post-PV 
myelofibrosis. 

The learning cohort for PMF was deliberately selected as 
being predominantly -/yA/C2V617F-negative (9/11 subjects), with 
only 2 J/A/C2V617F-positive patients. The stratification of patients 
according to the dynamic international prognostic scoring system 
(DIPSS)^^ and other clinical features of the cohort are summarized 
in Supplementary Table 2. 

Identification of genuine somatic mutations in coding sequences 
and microRNAs 

According to the NimbleGen procedure, the tumor granulocytic, 
germline salivary and CDS + lymphocytic gDNA samples were 
sheared, barcoded and mixed in an appropriate number of 
template libraries that were subsequently captured. At the end of 
the gDNA sequence selection procedure, all the DNA libraries 
were checked using QRT-PCR to verify the relative fold enrich- 
ment of the panel; all libraries produced a fully satisfactory result 
and confirmed successful captures (data not shown). Next, each 
library was sequenced with 454 Titanium technology using the 
Roche GS FLX platform. To exclude unbalanced libraries, that is, 
libraries composed of a nonequimolar quantity of the samples, we 
checked for any possible unequal sequencing depth in the 
barcoded samples. As exemplified for the library shown in 
Supplementary Figure 1, balanced quantities of samples were 
pooled, captured and then processed. 

DNA libraries were sequenced until the median SO-fold cover- 
age was reached for each tumor or control sample. A very high 
capture specificity was observed (94% of unique reads in the 
target region) with a similar uniformity throughout the chromo- 
somes (average standard deviation assessed to 1.6) (Figure 1). 

Genuine germline sample selection 

To identify somatic mutations in the MPN learning dataset, we 
sequenced paired DNA samples from granulocytes and germline 
cells. The candidate mutations detected in the granulocytes were 
then screened by subtracting those occurring in the germline to 
enable the identification of the variants that could be reliably 
considered truly acquired somatic variants. 

In the first set of experiments, we employed DNA obtained from 
paired saliva samples, but this DNA consistently presented 
variants belonging to the neoplastic clone, notably the JAK2\/6Mf 
mutation, with a comparable allele burden. These results 
prompted us to consider that saliva samples could not be 
considered genuine germline sources due to contamination by 
myeloid cells. Thus, we replicated the experiments using 
expanded CDS + T cell DNA for control samples, and a very low 
level of somatic contamination was found in just 1 DNA sample 
from CDS + T cells. Table 1 displays the mutational burden 
comparison for JAK2, MPL and IDH2 in libraries prepared from 
different sources of DNA (granulocytes, saliva and CDS + T cells). 
As a result, we discarded the data obtained from the salivary 
samples and selected for further analyses only those obtained 
from CDS + T cells, which were definitively considered germline 
control samples. 

Somatic mutations identification in genes and microRNAs in 
the learning cohort 

The tumor and paired germline sample data were both mapped 
against the human reference genome (hgl 9). A total of 1 1 006 and 
9691 unique variants in 1 057 and 1 0S9 genes for the germline and 
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Figure 1. Enrichnnent unifornnity landscape. The X-axis graphs the 
nunnber of target regions included in the NinnbleGen capture 'cancer 
exonne' panel (approxinnately 29600 target regions in total). 
The light bars represent the nunnber of target regions within the 
design, whereas the dark bars correspond to the effective nunnber of 
enriched target regions for each chronnosonne. The Y-axis displays 
the chronnosonnes. 



Table 1. Comparison of allele burden for selected mutations in 
libraries prepared from different DNA sources 


Patient 


Granulocytes 


CD3 + 
T cells 


Salivary 
samples 


JAK2 V617F 
PV_4 
PMF_2 
PV_9-PPV_5 
PV_3 

PV_6-PPV_2 


64% 

76% 
96-83% 

62% 
67-100% 


0% 
0% 
0% 
0% 
14% 


67% 
58% 
87% 
78% 
93% 


IDH2 R140Q 
PMF_9 


55% 


0% 


37% 


MPL W515L 
PMF_9 


80% 


0% 


38% 



somatic samples, respectively, were detected, as shown in 
Supplementary Table 3. 

The somatic variant identification procedure was intended to 
minimize the false positive somatic mutation rate. To this end, we 
used a two-step stringent approach. First, using a 'somatic' filter, 
we selected only DNA variants in tumor samples with no mutated 
reads either in the paired germline counterpart or in any other 
germline samples of the cohort. A 'functional' filter was then 
applied to eliminate the synonymous variants and all variants 
annotated in the 1000 Genomes database as having a frequency 
higher than 1% (see Materials and Methods for details). 
Supplementary Table 3 summarizes the narrowing of the unique 
detected variants after each of the analysis steps described above. 
Then, for evaluating for possible sequencing errors and the false 
positive rate, we validated these variants using a different type of 
NGS technology; specifically, we designed an Ion AmpliSeq 
panel containing all detected variants that was employed to 
re-sequence the same patient cohort via Ion Torrent PGM 
(1000-fold coverage). 

Using this strategy, we estimated a very low sequencing 
error rate (<1%), and we finally confirmed 136 genuine somatic, 
non-synonymous mutations affecting 121 genes. Twenty-five 
percent of these mutations are indexed in the dbSNP archive, 
and 2% of these specific variants are listed in COSMIC catalogue. 
The majority of mutations (89%) were estimated to be 'damaging' 
by at least 1 of the 5 algorithms that we used to investigate 
disease-causing potential (PolyPhen2, SIFT, Provean, Mutation 
Taster, LTR) (Supplementary Table 4 and Figure 2). 

The vast majority of the identified somatic mutations were 
missense (92%), whereas the minority (8%) were indels (small 
insertions and deletions). Despite patients harboring different 
numbers of somatic mutations spanning from 1 to 21 variants 
(Table 2), only 14 genes appeared to be recurrently mutated 
(Figure 3) in at least two patients. It should be noted that the 
acquisition of additional mutations and/or the occurrence of loss 
of some mutations at the time of disease evolution from PV to 
post-PV myelofibrosis in patients for whom samples were 
available at both disease phases suggested the occurrence of 
sub-clone selection during disease evolution (Supplementary 
Table 5). 

Five missense variants were identified in the 600 microRNAs 
coding sequences tested. These missense variants are summarized 
in Table 3. The MIR662, MIR663 and MIR542 sequences harbored 
missense mutations in their stem-loop coding region, and the 
MIR17 mutation was shown to affect the miR-17-5p mature miRNA 
sequence 5 bases downstream of its seed region. 



Somatic mutation recurrence in the validation cohort 
To distinguish between the identified somatic variants and 
possible novel clonal drivers from clonal passenger mutations, 
we tested the recurrence of the above-described variants in a 
broader cohort of 189 patients diagnosed with PMF (91 samples, 
48.2%), PV (50 patients, 26.4%) or post-PV myelofibrosis (48 
samples, 25.4%). We utilized the Ion AmpliSeq panel and PGM 
sequencing as previously described to obtain an ultra-deep 
amplicon sequencing (see Supplementary Information for techni- 
cal details) of the 141 variants, achieving a sample median of 
1000-fold coverage. The clinical parameters of the patients 
comprising the validation set are summarized in Supplementary 
Table 2. 

Excluding the JAK2, MPL, IDH2, ASXLl, TET2, CBL and DNMT3A 
known variants, 80 patients (42% of total) harbored at least 1 of 
the 141 somatic mutations tested for recurrence. Thirty somatic 
mutations (18.9% of the total) were displayed in at least 1 of the 
189 patients; these were all missense mutations with the 
exception of a single frameshift mutation occurring in the BRD4 
gene (Table 4). 
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Figure 2. Circular diagrann of nnutations found in MPN. Chronnosonnes are illustrated in the outer perinneter. Grey dots show the 'cancer exonne' 
regions of the NinnbleGen panel, whereas the histogranns show the captured (blue) and failed (red) target regions. MicroRNA or Gene Synnbol 
with annino acidic change refers to the variants found in our cohort. 



In addition, 8 genes {SCRIB, MIR662, BARDl, TCFU, FAT4, DAPS, 
POLG and NRAS) appeared as recurrently mutated in the cohort, 
some of which (5CRIB 7.9%, MIR662 7.4% and BARDl 5.3%) were 
more frequently mutated than previously identified, well-known 
mutational hotspots^^ (Table 4). 



Correlations between recurrent mutations and clinical features: 
NRAS c.35 G>A mutation analysis 

The groupwise associations between recurrent mutations 
and clinical and biological features were assessed using the 
test or Fisher's exact test. Possibly because of the small number 
of subjects harboring each unique mutation abnormality, no 
significant association with clinical features was found, with the 



exception of mutations at codon 12 of NRAS [NRASGMV and 
NRASGMD) that occurred in 5 out 102 PMF patients included in 
the learning and validation cohorts. This association resulted in 
P-values <0.05 for the highest DIPSS-plus score categories. DIPPS- 
plus^^ effectively combines prognostic information from the DIPSS 
with karyotype, platelet count, and transfusion status to predict 
overall survival in PMF. This evidence prompted us to screen the 
NRAS gene for mutations in codon 12 in an independent cohort of 
66 PMF patients via high-resolution melting analysis followed by 
Sanger sequencing validation, finding an additional 3 mutated 
subjects. Moreover, because NRAS and KRAS mutations have been 
described as possibly mutually exclusive, we also tested this 
cohort for KRAS mutations. As a whole, we found 8 of 168 MF 
patients (4.7%) harboring a heterozygous NRAS mutation in codon 
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Table 2. 

bold 


Number of genuine somatic, non-synonymous mutations harboured by each patient. Variants in MPN known mutated genes are shown in 


Diseose_ 


# Number of 




patient 


somatic 




number 


mutations 




PMF_1 


4 


FIP1L1D183G, JAK2V617F, PRPF19L341P, TPRN1790D 


PN\f_2 


6 


APCR2126G, BRCA1R163G, BRCA2K1690N, CTNNA1P735L, GAS8L126P, JAK2V617F 


PI\/IF_3 


7 


APLFR510TSX3, BRIP1L45dP, DLGAP2R134K, ERBB2L494F, FANCIV1R75dC, IV1PGR55C, RECQL4P879H 


PMF_4 


5 


rMii HJiT'^ Ao^rt**^ ir>i~ /ITT /'I A r»i /~rAir>/i-7/'/~ r»"rr»r)i~i~ I'^/ic-ix "7i~i iwiiXT^/iTr) 

DNIvlT3AK693C, IRF4T361A, PLCD1 R476C, PTPRFE1245K, ZFHX3K3243R 


PI\/IF_5 


3 


EIv1E1Q55dX, RABEP1E550G, SEPT6V168A 


PI\/1F_6 


5 


BARD1C557S, PER1L214F, SEZ6LT1014A, SEZ6LT381N, SYKV560A 


PN\F_7 


10 


AC5L6A52V, AIFIV12G2V, AKT1W99R, ATRG1362E, CAR5Q253R, IRF4R201H, RNF6R75W, TP53G245D, TP73E634K, 
TRHR83H 


PIV1F_8 


7 


CDC25AC159Y, EPS15Q365R, FAT1E3812G, lvlPLW515L, NRASG12V, RFWD2IV1299V, TCF12G300S 


PI\/1F_9 


10 


ASXL1K825X, DAB2IPN230D, FAT2T858I, HIF1AW31dR, IDH2R140Q, KDI\/15CC1247X, ivlPLP565L, iviPLW515L, 

NPIvl1A249G, NRA5G12V 


PI\/IF_10 


3 


CASP2D169G, HINT1D68G, SlviAD2lvl41 1 V 


PN\f_^ 1 


4 


ATICL384R C9orf102N534S, NOVA2A464V, RBBP6T1711A 


PPV_1 


2 


JAK2V617F, STK11M127R 


PPV_2 


10 


BARD1G203EtsX10, CHD5D1271N, CUX1E1065G, DNA5E1V237A, DU5P6D124G, JAK2V617F, NF1G2103R, 
NFIPI7O0R, NOTCH1 R1279H, PRDIvl2C1 1 95X 


PPV_3 


3 


JAK2V617F, TET2K550X, TIV1EIV11 15G7D 


PPV_4 


7 


CBLD460delD, DAP3G5E, GOLGA551 1 5tsX3, JAK2V617F, IV1TU51 PI 077tsX20, NCOA1P2045, 5CRIBH1217P 


PPV_5 


7 


BRCA2D1540G, IP6K2G28R, JAK2V617F, IvlGlvlTPISS, NTRK1Y72H, POLII261lvl, XRCC4E121V 


PV_1 


4 


FAT4R175L, JAK2V617F, RAG25111P, RBIV115R703G 


PV_2 


1 


JAK2V617F 


PV_3 


21 


ABL2E362G, BARD1K415R, CI 1orf30E916G, DLGAP2P572fsX72, FAT1G610R, FAT4T3251A, JAK2V617F, 
MAD1L1R54G, MECOMF522S, MLLD251G, MYH9F117U NINV2041A, PARP4D1547G, POLEF2063U POLKS832N, 
PRDM1L414P, RAD54BG758S, RAG2S368P, TSG101I128V, UIIV1C1K6R, VBP1D191G 


PV_4 


6 


BRD4E49fsX42, JAK2V617F, LIG3A836T, POLKT405I, PTPRGV426IV1, XAB2E26K 


PV_5 


7 


APCA2128V, CBFA2T3C169Y, JAK2V617F, PPP2R1 BG161 D, PTPN23P1099S, ST5L636fsX7, STK11M127R 


PV_6 


5 


BRD3T250A, JAK2V617F, IV1KL1G473R, NF1G2103R, NOTCH1R1279H 


PV_7 


7 


DPH1S311R H0XA1 1 M294fsX23, JAK2V617R PARP3IV1216V, ROB01D195G, TET2R550X, TMEM1 1 5G7D 


PV_8 


6 


CARSK482R, CSNK1EN172D, JAK2V617F, PTPN14W324X, SCRIBH1217P, XRCC1G188R 


PV_9 


11 


APTXC286fsX1, EP300M1470fsX2, FAT2G3691R, JAK2V617F, METV1247A, MLL3A2456T, MYH1 1 K1 761 R, 
MYST3G443S, POLGA154T, POLII261M, XRCC4E121V 



12 (5 harbored the NRASGMV mutation and 3 harbored the 
NRASGMD mutation). In addition, 3 patients of 168 evaluated 
(1.6%) harbored heterozygous KRAS mutations (G12R, G12S, 
and G13D); the patient carrying the KRASG^3D mutation also 
harbored the NRASGMW variant. Of note, NRA5 variants 
preferentially clustered among JAK2 wild-type subjects since 
only 1 of 8 mutated NRAS patients also harbored the J/A/C2V617F. 

Finally, we confirmed a significant association of NRAS variants 
with DIPSS-plus scoring (P = 0.022) since all the 8 mutated 
patients were included in the highest (intermediate-2 and high) 
risk category, as shown in Table 5. 

DISCUSSION 

The discovery of the JAK2\I6MV mutation represents the 
single most significant contribution to the characterization of 
the pathophysiology of MPN thus far and has major implications 
also for the treatment of these malignancies. Mutations char- 
acterizing genes other than JAK2 involve less than the 20% of MPN 
patients and often are co-expressed with the JAK2 mutation; thus, 
even though the impact of the mutational status of a specific set 
of genes (ASXLl, EZH2, SRSF2 and IDH) on disease outcome 
has been demonstrated in patients with PMF,^^ a comprehensive 
molecular landscape of MPN has not yet been completely 
depicted. Here, we performed the first large targeted 
NGS analysis aimed at exploring the mutational status of the 
broadest panel of known cancer-associated genes in MPN. This is 
the first report describing a robust NGS study design and an 
accurate data analysis pipeline aimed at minimizing the somatic 
mutation false positive rate. We also demonstrated that saliva 
samples are often heavily contaminated by myeloid cells and that 
expanded CD3+ T cells in culture therefore serve as the most 




Figure 3. Heatmap of the found known variants and of genes 
presenting one or more variants in two or more patients in the data 
set. The horizontal axis presents the sequenced complete dataset of 
patients, with the PV samples grouped on the left, the evolution to 
post-PV MP patients (PPV) in the center and the PMF on the right. 
The vertical axis illustrates the recurrently mutated gene as 
exemplified in the legend. 
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Table 3. Sonnatic mutations in nnicroRNAs coding regions 


miRNA ID 


Chromosome 


SNP 


Variont_effect 


miRNA location 


Region 


I\/1IR662 

MIR17 

MIR19A 

I\/1IR542 

MIR663A 


chrl 6:82021 5 
chrl 3:92002884 
chrl 3:920031 95 
chrX:1 33675465 
chr20:26188912 


rs74656628 
unknown 
unknown 
unknown 
rs7266947 


T>A P.T924S 
A>G 
T>C 
C>T 
A>C 


MSLNL gene 
intergenic 
intergenic 
intergenic 
intergenic 


Pre-miRNA 
Mature sequence hsa-nniR-17-5p 
Pre-miRNA 
Pre-miRNA 
Pre-miRNA 



Table 4. Recurrent mutations in the validation dataset 



Chromosome 


Reference 
nucleotide 


Vo riant 


Gene 


AA 
Change 


Pathology 
(PV\PPV\PI\/IF) 


Freq (PV\PPV\PMF) 


Freq 
tot 


PV 

thrombotic 

risk 
(Low\High) 


PPV-PMF IP55 
(Low\lnt-l\lnt- 
2 1 High \ Unknown ) 


chr9 


G 


T 


JAK2 


V617F 


50|48|72 


100.00|100.00|79.12 


90.0 


14|36 


17|33|41|21|8 


chr8 


T 


G 


SCRIB 


H1217P 


5|5|5 
II 


10.00|10.42|5.49 


7.9 


213 

1 

11 


1|1|4|4|0 

1 1 1 1 


chrl 6 


T 


A 


MIR662 




2|6|6 


4.00|12.50|6.59 
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reliable germline control for identifying true sonnatic nnutations in 
MPN. We set up an analysis pipeline using the nnost stringent 
procedure to avoid false-positive calls of sonnatic nnutations. 
In particular, paired gernnline and sonnatic DNA sannples of the 
learning dataset were sequenced for reaching the sanne fold 
coverage. Moreover, we called sonnatic only those variants with no 
reads both in the paired gernnline DNA and in any other gernnline 
sannple of the cohort. To these stringent 'sonnatic' filters, two 
additional controls were added to discard any possible 
polynnorphisnns (only variants with a frequency <1% in the 
1000 genonnes database were retained) as well as possible benign 
nnutations (only non-synonynnous variants were retained). Finally, 
all filtered variants were annotated with the functional effect 
prediction of five different algorithnns. 

Using this multistep bioinfornnatics pipeline, we finally identi- 
fied 141 'genuine' sonnatic non-synonynnous nnutations affecting 
121 genes and 5 nniRNAs that were then tested for recurrence in a 
larger cohort of 189 patients. The variants found in the SCRIB, 
MIR662, BARDl TCFU, FAT4, DAPS, POLG and NRAS genes were 
recurrent with a frequency higher than 3%. In particular, SCRIB, 
MIR662 and BARDl showed frequencies of 7.9, 7.4, and 5.3%, 



respectively, which were higher than those described for sonne 
well-known nnutational hotspots.^^ 

Sonne findings appear to join sonne of these genes, suggesting 
the potential role of these genes in the pathogenesis of MPN. 
SCRIB and FAT4 are two proteins that regulate planar cell polarity 
differentiation. SCRIB encodes a cytoplasnnic scaffolding 
protein consisting of leucine-rich repeats and PDZ donnains that 
regulates protein-protein interactions,^"^ while FAT4 belongs to 
the E-cadherin fannily and nnay control noncanonical Wnt/planar 
cell polarity signaling;^^ both pathways play a crucial role in the 
regulation of polarity and tissue honneostasis. Interestingly, TCF12 
protein, also known as HEB, could be linked with this pathway. 
TCF12 fornns heterodinners with other bHLH E-proteins and with 
chinneric protein AMLI-ETO^^"^^ and works as a transcriptional 
repressor of E-cadherin, thus playing an innportant role in cancer 
cell progression by enhancing the epithelial-nnesenchynnal 
transition process.^^ The endothelial-nnesenchynnal transition is a 
fornn of the nnore widely known epithelial-nnesenchynnal transi- 
tion; sinnilarly to epithelial-nnesenchynnal transition, endothelial- 
nnesenchynnal transition can be induced by transfornning growth 
factor-p and allows a polarized cell, which nornnally interacts with 
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Table 5. Clinical features of PMF patients screened for NRAS mutations 
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0.022 


Low 


16 (10%) 
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Intermediate- 1 


49 (30.6%) 


0 




Intermediate- 2 


65 (40.6%) 


6 (75%) 




High 


30 (18.8%) 


2 (25%) 




Palpable spleen; n (%) 


98 (61.2%) 


4 (50%) 


ns 


JAK2V61 7F; n (%) 


96 (60%) 


1 (12.5%) 


0.001 


Progression to acute leukemia; n (%) 


12 (7.5%) 


0 


ns 


Dead for disease progression; n (%) 


49 (30.6%) 


5 (62.5%) 


0.043 


^Evaluated on available data (n = 100/1 60 for RAS wild-type and 8/8 for RAS mutated. Abbreviations: DIPSS-plus, dynamic international prognostic scoring 
system-plus; IPSS, international prognostic scoring system. 



the basement mennbrane via its basal surface, to undergo multiple 
changes that enable it to assume a mesenchymal cell phenotype, 
which includes enhanced migratory capacity, invasiveness, 
elevated resistance to apoptosis and an increased ability to 
induce fibrosis. Interestingly, endothelial-mesenchymal transition 
and the resulting endothelial cell fate have been recently 
implicated in the pathogenesis of PMF.^°"^^ 

The 5C/?/8H1217P variant is a missense mutation predicted to be 
damaging via SIFT and PolyPhen2. In particular, missense 
mutations in the same SCRIB c-terminal domain region signifi- 
cantly disrupt the membrane subcellular localization of the 
protein, and this was suggested to be one possible pathogenic 
mechanisms for planar cell polarity alterations in mammals.^^'^^ 
Similarly, the FAJARMSL mutation is located in the extracellular 
cadherin domain, and Polyphen2 together with MutationTaster 
were used to reveal that this FAJ4 mutation is predicted 
to adversely affect protein function and alter the planar cell 
polarity environment. Furthermore, studies in Drosophila and 
mammalian cell lines have shown that SCRIB loss and 
RAS activation cooperate (interclonally or intraclonally) to 
promote invasion.^^"^^ In Drosophila, interclonal cooperation in 
RasV12 and 5cr/6-minus tumor clones revealed a two-level 
mechanism in which 5cr/6-minus cells promote the neoplastic 
development of RasV12 cells. Specifically, this mechanism involves 
(1) the spread of stress-induced JNK activity from scrib-minus cells 
to RasVI 2-activated cells followed by (2) the expression of JAK/ 
STAT-activating cytokines downstream of JNK.^^ 

Further insights into the molecular and pathogenetic complex- 
ity of MPN emerged from the discovery of BARDl, POLG and 
DAPS mutations. All three of these variants are predicted to be 
damaging, and while DAP3 and POLG are mitochondrial proteins 
involved in DNA repair and apoptosis pathways,^^''^° BARDl is a 
nuclear BRCA1 -independent mediator between genotoxic stress 
and p53-dependent apoptosis"^^ (see the Discussion section of the 
Supplementary Information for details). 

In addition to above considerations, we were intrigued by the 
fact that the two PMF patients in the learning cohort showing 
the NRA5G12V mutation presented a rapid progression into 
an accelerated form of the disease. Therefore, we attempted 
to validate this association in an additional 66 PMF cases, 
composing a final cohort of 168 PMF. Moreover, because NRAS 
and KRA5 mutations have been described as potentially mutually 



exclusive,"^^'"^^ we tested this additional cohort for KRAS mutations 
to verify mutual exclusivity. We found that 4.7% of the 
PMF patients harbored a heterozygous NRAS mutation in codon 
12. Conceivably, considering all the 8 NRAS mutated patients all 
together, we found that this mutation was associated with a 
poorer prognosis as supported by the fact that the 8 patients 
clustered in the intermediate-2 and high-risk category of the 
DIPSS-plus score. Only 2% of all PMF patients harbored 
heterozygous KRAS mutations (G12R, G12S and G13D), while the 
co-occurrence of NRASGMV and KRASG^3D or JAK2V6Mf 
mutations was observed only for a single patient (we were 
unable to determine whether the two mutations arose from the 
same or different clone since frozen cell samples were not 
available for this patient). The low number of patients harboring 
KRAS mutations alone, precluded any meaningful analysis of the 
association with disease progression. Overall, these data suggest 
that NRAS mutations specifically associate with a poorer outcome, 
although the molecular mechanism remains to be investigated. 

Mutations in microRNAs warrant a separate discussion. Even if 
additionally specific studies are necessary to support a functional 
role of the MIR662 variant rs74656628, some interesting features 
are worth mentioning. The human MIR662 is an intragenic 
microRNA that resides in a non-coding exon sequence of the 
mesothelin-like gene. This is a heterozygous missense mutation 
that occurs in the precursor sequence of the micro-RNA.^^ 
A bioinformatics prediction analysis run with the /n-5/7/co- Dicer 
and RNAFold suggested that this mutation could modify the RNA 
secondary structure and lead to the production of a different 
mature miRNA (see Supplementary Information for details). 

In summary, this NGS study presents new data that contribute 
to elucidating the very high genomic complexity in MPN disorders 
and identifies new variants in cancer-related genes that are 
potentially involved in the pathogenesis of the disease and may 
deserve further studies. 
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